Basic Tutorial

Dana Gibbon

2021-09-07

Background

While agricultural species in artificial breeding programs are often mated in specific ways to maximize yield, harvest or other desired traits, many breeding programs do not incorporate an opportunity for direct mate choice or to imitate wild-like mate choice decisions. Mate choice, however, is an important component of sexual reproduction that can affect individual reproductive success (i.e. number of offspring that survive to adulthood) (e.g. Drickamer et al. 2000; Reynolds & Gross 1992; Sandvik et al. 2000), and offspring traits such as performance (e.g. Drickamer et al. 2000), and growth (e.g. Reynolds & Gross 1992), and evolution (Andersson 1994). Both reproductive success, and offspring traits such as growth and performance, are important factors to consider when mating individuals for harvest and/or conservation programs (e.g. Martin-Wintle et al. 2015). One issue with imitating wild-like mate choice is that mate choice is complicated, multifaceted (Candolin 2003), and variable across species, individuals, and contexts (Jennions & Petrie 1997; Qvarnström 2001; Bussiere et al. 2008). Fortunately, there are tools that can be used to better understand mate choice in other species. For instance, many individual traits important to mate choice have a genetic basis (Chenoweth & Blows 2006). It is therefore possible to use genetic information to (1) predict which partners prospective mates may choose in the wild and (2) use that information in an artificial breeding program, such as in fish hatcheries, which could create more ‘wild-like’ offspring in the absence of free mate choice. MultifacetedCHOICE allows users to input previously known information about mating preferences (i.e. choice for positive or negative assortment at individual single nucleotide polymorphisms (SNPs)) of wild individuals of any species at multiple loci and genotypes for multiple individuals at many loci.

Quick start

devtools::install_github("danagibbon/MultifacetedCHOICE",
                         build_vignettes = FALSE)
library(MultifacetedCHOICE)
## load data
geno <- read.csv(system.file("extdata", "sample_data.csv", package = "MultifacetedCHOICE"))
meta_data <- read.csv(system.file("extdata", "meta_data.csv", package = "MultifacetedCHOICE"))
allele_info <- read.csv(system.file("extdata", "allele_info.csv", package = "MultifacetedCHOICE"))
## Make the database
DBs <- make_database(gtseq = geno, metadata = meta_data, 
                     allele_info = allele_info)
# set samples
females <- geno$Sample[1:7]
males <- geno$Sample[21:26]
# Check Sample IDs
check_samples(DB = DBs, females = females, 
              males = males, used = FALSE)
# run samples, rank for each sample
all_matings <- get_all_rankings(DB = DBs, females = females, males = males,
                                type = "all_alleles")
# Rank Matches
tips <- rank_all_mates(females, males, ranked_list=all_matings)

Details

Load MultifacetedCHOICE

library(MultifacetedCHOICE)

Input Data

You need to have these 3 dataframes set up ahead of time.

GT-seq output

Contains:

  • Sample IDs
  • Raw Reads
  • On Target Reads = reads w/ fwd primer seq AND probe seq / reads w/ fwd primer seq
  • Percent on Target
  • % of sites with coverage
  • IFI = This version also outputs the IFI score (Individual fuzziness index) for each individual sample. This is a measure of DNA cross contamination and is calculated using read counts from background signal at homozygous and No-Call loci. Low scores are better than high scores. (https://github.com/GTseq/GTseq-Pipeline/blob/master/GTseq_Genotyper_v3.pl)
  • The rest of the columns are sites

The site column names need to match the allele IDs in the allele info dataframe

geno <- read.csv(system.file("extdata", "sample_data.csv", package = "MultifacetedCHOICE"))
datatable(head(geno)[,1:10],
          filter = 'top',
          rownames = TRUE,
          extensions = 'Buttons',
          options = list(pageLength = 10,
                         dom = 'Bfrtip', 
                         buttons = c('copy', 'csv', 'excel', 'pdf', 'print')))

Meta Data

This dataframe must include the following columns:

  • Sample IDs (must match Sample IDs from the GT-seq output)
  • Sex

Optional column (you can add any you want) examples:

  • Date
  • Jack
  • Measurements
meta_data <- read.csv(system.file("extdata", "meta_data.csv", package = "MultifacetedCHOICE"))
datatable(head(meta_data),
          filter = 'top',
          rownames = TRUE,
          extensions = 'Buttons',
          options = list(pageLength = 10,
                         dom = 'Bfrtip', 
                         buttons = c('copy', 'csv', 'excel', 'pdf', 'print')))

Allele Information

Dataframe with:

  • Allele ID: chrom:position
  • Chromosome
  • Position
  • Site ID: Must match the column names in the GT-seq output
  • Advantage: “assortive” or “disassortive”
allele_info <- read.csv(system.file("extdata", "allele_info.csv", package = "MultifacetedCHOICE"))
datatable(head(allele_info),
          filter = 'top',
          rownames = TRUE,
          extensions = 'Buttons',
          options = list(pageLength = 10,
                         dom = 'Bfrtip', 
                         buttons = c('copy', 'csv', 'excel', 'pdf', 'print')))

Make Database

You will need 3 dataframes with the above criteria:

## Make the database
DBs <- make_database(gtseq = geno, metadata = meta_data, 
                     allele_info = allele_info)
#> All Sample IDs found
#> All loci allele IDs found
#> Warning: Closing open result set, pending rows

#> Warning: Closing open result set, pending rows

#> Warning: Closing open result set, pending rows

Get all possible matches

Input:

# set samples
females <- geno$Sample[1:7]
print(females)
#> [1] "i036_A01_1_M001" "i036_A02_1_M009" "i036_A03_1_M017" "i036_A04_1_M025"
#> [5] "i036_A05_1_M033" "i036_A06_1_M041" "i036_A07_1_F001"
males <- geno$Sample[21:27]
print(males)
#> [1] "i036_B09_1_F018" "i036_B10_1_F026" "i036_B11_1_F034" "i036_B12_1_F042"
#> [5] "i036_C01_1_M003" "i036_C02_1_M011" "i036_C03_1_M019"

# Check Sample IDs
check_samples(DB = DBs, females = females, 
              males = males, used = FALSE)
#> used set to: FALSE. Not checking for repeats
#> All sample IDs present in supplied database

# run samples, rank for each sample
all_matings <- get_all_rankings(DB = DBs, females = females, males = males,
                                type = "all_alleles")
# one comparison
datatable(all_matings[[1]],
          filter = 'top',
          rownames = TRUE,
          extensions = 'Buttons',
          options = list(pageLength = 10,
                         dom = 'Bfrtip', 
                         buttons = c('copy', 'csv', 'excel', 'pdf', 'print')))

Rank Matches

Input:

tips <- rank_all_mates(females, males, ranked_list=all_matings)
datatable(tips,
          filter = 'top',
          rownames = TRUE,
          extensions = 'Buttons',
          options = list(pageLength = 10,
                         dom = 'Bfrtip', 
                         buttons = c('copy', 'csv', 'excel', 'pdf', 'print')))